This experiment aims to compare the performance of the AmazonForecast automated solutions against classical statistical models using StatsForecast using the M5 and M4 datasets.
In this notebook we will explain the data used in the experiment.
import pandas as pd
from statsforecast import StatsForecast
/home/ubuntu/fede/statsforecast/statsforecast/core.py:21: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console)
from tqdm.autonotebook import tqdm
M5 dataset
Target data
train_df = pd.read_parquet('s3://m5-benchmarks/data/train/target.parquet' )
item_id
timestamp
demand
0
FOODS_1_001_CA_1
2011-01-29
3.0
1
FOODS_1_001_CA_1
2011-01-30
0.0
2
FOODS_1_001_CA_1
2011-01-31
0.0
3
FOODS_1_001_CA_1
2011-02-01
1.0
4
FOODS_1_001_CA_1
2011-02-02
4.0
train_df = train_df.rename(columns= {'item_id' : 'unique_id' ,
'timestamp' : 'ds' ,
'demand' : 'y' })
StatsForecast.plot(train_df)
Static variables
static_df = pd.read_parquet('s3://m5-benchmarks/data/train/static.parquet' )
item_id
sku_id
dept_id
cat_id
store_id
state_id
0
FOODS_1_001_CA_1
FOODS_1_001
FOODS_1
FOODS
CA_1
CA
1
FOODS_1_001_CA_2
FOODS_1_001
FOODS_1
FOODS
CA_2
CA
2
FOODS_1_001_CA_3
FOODS_1_001
FOODS_1
FOODS
CA_3
CA
3
FOODS_1_001_CA_4
FOODS_1_001
FOODS_1
FOODS
CA_4
CA
4
FOODS_1_001_TX_1
FOODS_1_001
FOODS_1
FOODS
TX_1
TX
Temporal variables
temporal_df = pd.read_parquet('s3://m5-benchmarks/data/train/temporal.parquet' )
item_id
timestamp
snap_CA
snap_TX
snap_WI
sell_price
0
FOODS_1_001_CA_1
2011-01-29
0.0
0.0
0.0
2.0
1
FOODS_1_001_CA_1
2011-01-30
0.0
0.0
0.0
2.0
2
FOODS_1_001_CA_1
2011-01-31
0.0
0.0
0.0
2.0
3
FOODS_1_001_CA_1
2011-02-01
1.0
1.0
0.0
2.0
4
FOODS_1_001_CA_1
2011-02-02
1.0
0.0
1.0
2.0
temporal_df = temporal_df.rename(columns= {'item_id' : 'unique_id' ,
'timestamp' : 'ds' })
StatsForecast.plot(train_df, temporal_df)
M4 Daily dataset
train_df = pd.read_parquet('s3://m4-benchmarks/data/train/target.parquet' )
item_id
timestamp
target_value
0
D1
2019-03-18
1017.1
1
D1
2019-03-19
1019.3
2
D1
2019-03-20
1017.0
3
D1
2019-03-21
1019.2
4
D1
2019-03-22
1018.7
train_df = train_df.rename(columns= {'item_id' : 'unique_id' ,
'timestamp' : 'ds' ,
'target_value' : 'y' })
StatsForecast.plot(train_df)
Give us a ⭐ on Github